An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures
نویسندگان
چکیده
Nearest neighbor retrieval can be defined as the task of finding the objects that are most similar to a query from a given a database of objects. It find its application in areas ranging from medical domain, financial sector, computer vision, computational sciences, computational geometry, information retrieval, etc. With the expansion of internet, the amount of digitized data is increasing by leaps and bounds. Retrieval of nearest nearest neighbors accurately and efficiently becomes challenging in such a scenario as the database contain a large number of objects. The problem gets worsen when the underlying distance measure used to compute [dis]similarity is computationally expensive. In such a scenario, sequential scan of data would take a lot of time which is the biggest problem for any online retrieval system. For example in biometric authentication systems, a particular person’s biometric template is compared against all the registered samples in a database to identify the person. This process can be extremely time consuming in large databases even if the matching algorithm is extremely fast. For example, to do background check of a person who is crossing the border using the complete IAFIS,(a biometric person identification system at the U.S. border crossings), requires around 55 million comparisons. Even with the state of the art matching algorithms and computing facilities, this would take close to 10 minutes, which is not practical considering the millions of people who cross the border every month. Even for criminal investigations, it is desirable to get a quick and approximate search done immediately rather than the typical turn-around time of a few days for a search. This thesis proposes a novel method for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed technique is domain-independent, and can be applied in arbitrary spaces, including nonEuclidean and non-metric spaces. The main contributions of our work are : • A representation scheme for objects in a dataset that allows for fast retrieval of approximate nearest neighbors in non-euclidean space. The approach named Hierarchical Local Maps (HLM),make use of manifold learning techniques to compute linear approximation of local neighborhoods. • Search mechanism combined with filter and refine approach is proposed that minimizes the number of exact distance computations for computationally expensive distance measure. • Study performance of our scheme on biometric data and study the parameters affecting its performance.
منابع مشابه
Classification, with Applications to Object and Shape Recognition in Image Databases
Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time,...
متن کاملAdaptive approximate nearest neighbor search for fractal image compression
Fractal image encoding is a computationally intensive method of compression due to its need to find the best match between image subblocks by repeatedly searching a large virtual codebook constructed from the image under compression. One of the most innovative and promising approaches to speed up the encoding is to convert the range-domain block matching problem to a nearest neighbor search pro...
متن کاملImproving k-Nearest Neighbor Rule: Using Geometrical Neighborhoods and Manifold-based metrics
Sample weighting and variations in neighborhood or data-dependent distance metric definitions are three principal directions considered for improving k-NN classification technique. Recently, manifold-based distance metrics attracted considerable interest and computationally less demanding approximations are developed. However, a careful comparison of these alternative approaches is missing. In ...
متن کاملA Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing
Nowadays, Nearest Neighbor Search becomes more and more important when facing the challenge of big data. Traditionally, to solve this problem, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the...
متن کاملRandomly Projected KD-Trees with Distance Metric Learning for Image Retrieval
Efficient nearest neighbor (NN) search techniques for highdimensional data are crucial to content-based image retrieval (CBIR). Traditional data structures (e.g., kd-tree) usually are only efficient for low dimensional data, but often perform no better than a simple exhaustive linear search when the number of dimensions is large enough. Recently, approximate NN search techniques have been propo...
متن کامل